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Abstract 

A flaw in the greedy approximation algorithm proposed by Zhang et al. for minimum 
connected set cover problem is corrected, and a stronger result on the approximation 
ratio of the modified greedy algorithm is established. The results are now consistent 
with the existing results on connected dominating set problem which is a special case 
of the minimum connected set cover problem. 
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1. Introduction 

Let be a set with a finite number of elements, and S = {Si Q V : i = 1, ...,n} 
a collection of subsets of V. Let G be a connected graph with the vertex set S. A 
connected set cover (CSC) TZ with respect to (V, S, G) is a set cover of V such that 
TZ induces a connected subgraph of G. The minimum connected set cover (MCSC) 
problem is to find a CSC with the minimum number of subsets in S. In fill, Zhang 
et al. proposed a greedy approximation algorithm (Algorithm 2 in for minimum 
connected set cover (MCSC) problem, and obtained the approximation ratio of this 
algorithm. This algorithm has a flaw, and the approximation ratio is incorrect. In this 
note, we modify the greedy algorithm to fix the flaw and establish the approximation 
ratio of the modified algorithm. The approximation ratio is with respect to the optimal 
solution to the set cover problem (V, S), instead of the optimal solution to the MCSC 
problem (y, 5, G), and thus it is stronger than the one obtained in 111. 



2. Greedy Algorithm 

Before stating the algorithm, we introduce the following notations and definitions. 
Most of them have also been used in m. For two sets Si, S2 G S, let distG(S'i, 6*2) 
be the length of the shortest path between 6*1 and ^2 in the auxiliary graph G, where 
the length of a path is given by the number of edges; Si and ^2 are said to be graph- 
adjacent if they are connected via an edge in G (i.e., distG(5'i, S2) — 1), and they are 
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said to be cover-adjacent if 5i fl S'2 ^ 0. Notice that in general, there is no connection 
between these two types of adjacency. The cover-diameter Dc{G) is defined as the 
maximum distance between any two cover-adjacent sets, i.e., 

Dc{G) = max{distG(S'i, 5*2) | Si, S2 & S and Si, S2 are cover-adjacent}. 

At each step of the algorithm, let TZ denote the collection of the subsets that have 
been selected , and U the set of elements of V that have been covered. Given 7?. ^ 
and a set S £ S \ TZ, an TZ S path is a path {Sq, Si, Sk} in G such that (i) 
So e 7^; (ii) Sk = S; (iii) Si, Sk e S\n. Let\Ps\ denote the length of an 7^ ^ 5 
path Ps, and it is equal to the number of vertices of P5 that does not belong to TZ. Then 
we define the weight ratio e{Ps) of Ps as 

elPs) ^ J^, (1) 

where |C(P5)| is the number of elements that are covered by P5 but not covered by 
TZ. 

For the greedy algorithm in |[ll, after the subset with the maximum size is selected 
at the first step, only the subsets that are not in TZ and are cover-adjacent with some 
subset in TZ are considered in the following iterations. At some iteration, there may 
not exist a subset S € S \ TZ that is cover-adjacent to a subset in TZ, and if we only 
consider cover-adjacent subsets, then the algorithm will enter a deadlock. Consider a 
simple example where V = {1, 2, 3, 4}, S = {{1, 2}, {1}, {2}, {2, 3}, {4}}, and G is 
a complete graph. If we apply the greedy algorithm in f l!] to this MCSC problem, then 
after {1,2} and {2, 3} are selected, the algorithm enters a deadlock. 

To fix this problem, we modify the greedy algorithm to include not only cover- 
adjacent subsets but also graph-adjacent subsets. The modified greedy algorithm for 
the MCSC problem is presented below. 



Input: {V,S,G). 

Output: A connected set cover 7?.. 

1. Choose So ^ S such that j^ol is the maximum, and let 7^ = { So} and U = Sq. 

2. While y \ C/ / DO 

2.1. For each S E S \ TZ which is cover-adjacent or graph-adjacent with a set 
in TZ, find a shortest TZ ^ S path Ps . 

2.2. Select Ps with the minimum weight ratio e{Ps) defined in ([T]i, and let 
TZ^TZUPs (add all the subsets of Ps to TZ)andU ^UU C{Ps). 

End while 

3. Return 7^. 



3. Approximation Ratio 

In llll, the approximation ratio of the greedy algorithm is shown to be 1 + Dc (G) • 
H{j — 1), where 7 = max{|S'| | S* G 5} is the maximum size of all the subsets in 
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<S and H{-) is the harmonic function. In the proof, the authors assume that for every 
subset S* in the optimal solution TZ'^ to the MCSC problem, at least one of its elements 
is covered by the subset 5*0 selected by the greedy algorithm at step 1 . In general, some 
S* may not share any common elements with 5*0. Thus, this assumption is invalid, and 
the resulting approximation ratio is incorrect. In the following theorem, we establish 
the approximation ratio of the modified greedy algorithm for the MCSC problem. The 
proof of this theorem does not require this assumption, and it takes into account the 
additional search of graph-adjacent subsets in the modified algorithm. Furthermore, 
a stronger result on the approximation ratio is shown in the proof (see Lemma [T]i. 
Specifically, the approximation ratio is between the solution returned by the algorithm 
and the optimal solution to the set cover problem, and the latter is always not greater 
than the optimal solution to the MCSC problem. 

Theorem 1. Given an MCSC probelm (V, S, G), the approximation ratio of the modi- 
fied greedy algorithm is at most Dc{G){l + -ff (7 — 1)), where 7 = max{|5| | 5 G 5} 
is the maximum size of the subsets in S and H{-) is the harmonic function. 

Proof. We show a lemma stronger than the above theorem. 

Lemma 1. Let TZ* be an optimal solution to the set cover problem {V,S}, and TZ 
returned by the modified greedy algorithm for the MCSC problem (V, S, G). Then we 
have that 

^<I3c(G')(l + H(7-l)). 

Let TVfj be an optimal solution to the MCSC problem [V.TZ.G). Since \Tl*\ < 
\TZc\' Theorem[T]follows from Lemma [T] 

Proof of LemmaUI The proof is based on the classic charge argument. Each time 
a subset 5*0 (at step 1) or a shortest TZ ^ S path Pg (at step 2) is selected to be added 
to TZ, we charge each of the newly covered elements (at step 1) or e{Pg) defined 
in ([TJ (at step 2). During the entire procedure, each element of V is charged exactly 
once. Assume that step 2 is completed in K — 1 iterations. Let Pg^ be the shortest 
TZ S path selected by the algorithm at iteration i. Let w{a) denote the charge of an 
element v inV. Then we have 

E-(-) = E E -(-) = E E ^P^ = El^^,Hl7^l, (2) 

vev i=o i.ec(P|j i=o vec{P^^) ' ^ i=o 

where P*^, = {So}, \P^o\ = 1^ and C(P|o) = ^o- 

Suppose that TZ* = {S^, S*^} is a minimum set cover for {y,iS}. Since an 
element of V may be contained in more than one subset of TZ*, it follows that 

N 

E "'(^') ^ E E ^(^)- 

v£V i=l -uSS* 
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Next we will show an inequality which bounds from above the total charge of a 
subset in TZ* , i.e., for any S* eTZ*, 



J2w{v)<Dc{GKl + H{\S*\-l)). (4) 

Let jii {i = 0,1, K) be the number of elements of S* that have not been covered 
by S after iteration « — 1 , where step 1 is considered as iteration . Notice that no = \S*\ 
and riK = 0. Let {ii, ik] denote the subsequence of {i ~ 0, 1, K — 1} such that 
TLi — rii+i > 0, i.e., at iterations i = ii , i^, at least one element of S* is covered by 
Pjj for the first time. For each element v covered at iteration ii, if ii = 0, based on 
the greedy rule at step 1 , we have 



w{v) ^ e{P*s,) < —; (5) 



Otherwise, depending on whether a cover-adjacent subset or a graph-adjacent subset is 
selected at iteration ii, 

f iri-p*''n (cover-adjacent) 1 Dr(G) 



|C(P|. )| 



(graph-adjacent) T^ii — "-(ii+i) 



The inequality in (|6]l is due to three facts; (i) Si^ is cover-adjacent with TZ, lead- 
ing to \Pg, I < Dc{G); (ii) Pg. covers at least 71;^ — Ti^j^^i-) elements of V, i.e., 
\C{Ps^_^)\ > "ii - (iii) Dc{G) > 1. Combining Q and © yields 

< . (7) 

The proof in ^ does not consider the case of ii 7^ 0, leading to the wrong inequality 

w{v) < . 

Consider two cases: 

(i) If all the elements of S* are covered after iteration ii, i.e., = 0, then 

Dc{G) 



{v)<Y.^^ = Dc{G). (8) 



(ii) If not all the elements of S* are covered by TZ after iteration ii, S* becomes 
cover-adjacent with TZ and thus a candidate for being selected at the following 
iterations. Then based on the greedy rule at step 2, we have that for an element 
V G S* covered at iteration ij {j = 2, k), 

wiv) = e(PJJ < e(P5.) - < (9) 
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Notice that if P5. is selected at iteration ij, at least rii elements will be covered 
for the first time, i.e., \C{Ps*)\ > rii^. 

It follows from (|7l9]l that 



E( \ ^ / \ Dc{G) Dc{G) 

Here we have used the fact that nj^^+i) = rii^^^-^y It is because between iteration 
ij and iteration no elements of 5** are covered. 

For the summation term in (fTOl i. we have the following inequality: 

v "'-'""'""^' < E — + ^^ + --- + — ^ 

= ff(n,J<i7(|5*|-l). (11) 
The last inequality is due to the fact that rii^ < iii^ — 1 = jS"* | — 1. 
Eqn. (|4| is a direct consequence of ([8j, ( fTOl i. and (fTTT i. Thus, using (jSHUi, 

w 

\n\ ^ J2 ^(^') ^ E E ^(^) 

< E^c(G)(l + i?(|5*|-l)) 

2=1 

< i^c (G) (1 + i? (7 I . □ 

Let n = |V^| be the number of elements of V. Then the approximation ratio of 
the modified greedy algorithm is Dc{G){l + H{j — 1)) = O(lnn). Since the set 
cover problem is a special case of the MCSC problem where the auxiUary graph G is 
complete and the best possible approximation ratio for the set cover problem is 0(ln n) 
(unless NP has slightly superpolynomial time algorithms) the modified greedy 
algorithm achieves the order-optimal approximation ratio. 



4. Connection with Connected Dominating Set Problem 

A dominating set of a graph is a subset of vertices such that every vertex of the 
graph is either in the subset or a neighbor of some vertex in the subset. The connected 
dominating set (CDS) problem asks for a dominating set of minimum size where the 
subgraph induced by the vertices in the dominating set is connected. It is not difficult 
to show that the CDS problem is a special MCSC problem. Specifically, given an 
undirected graph H = {V, E), we can derive an MCSC problem {V, S, G) from the 
CDS problem of H as follows: 
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(i) the universe set V is the vertex set V of H; 

(ii) For each vertex v e V, create a subset S-u = {v} U {all neighbors of v} of V in 
S; 

(iii) the auxiliary graph G is the same as the given graph H except that each vertex 
of H is replaced by Sv, as illustrated in Fig.[T] 

It can be shown that by exchanging the vertex subset Sy with the vertex v, the optimal 
solution to the derived MCSC problem is equivalent to the optimal solution to the CDS 
problem. 




H G 

Figure 1 : An illustration of the auxiliary graph G deiived from the given graph H. 

Guha and Khuller propose a greedy algorithm (Algorithm I in lljt] ) for CDS problem 
with an approximation ratio 2(1 + H{'-f — 1)), where 7 = max{| 5*1,1 | v S V} and 
7 — 1 is the maximum degree of the vertices in H. The modified greedy algorithm for 
the MCSC problem reduces to the greedy algorithm of ^ when applied to the CDS 
problem. Notice that Dc{G) = 2 for the derived MCSC problem, since two vertex 
subsets Svi and Sv2 are overlapping if and only if their corresponding vertices vi and 
V2 have at least one common neighbor. We see that the approximation ratio of the 
modified greedy algorithm established here is consistent with the one shown in 
while the original approximation ratio obtained in IH is not. 
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